Skip to content

Conversation

@igor0
Copy link
Contributor

@igor0 igor0 commented Dec 15, 2025

An open-source library built on the Context Engine SDK that makes diverse sources searchable across agents and apps

Context Connectors enables users to:

  • Build indexes from Git repos (GitHub, GitLab, BitBucket), documentation websites, or local filesystem: index code, documentation, runbooks, schemas, configs, and more. Use DirectContext in the Context Engine SDK for custom sources.
  • Store indexes on a local filesystem for fast & simple access, or in S3 for persistent storage in production apps.
  • Search indexes via interactive agent, MCP for AI integrations, CLI for quick searches, or DirectContext in the Context Engine SDK for custom implementations.

igor0 added 30 commits December 15, 2025 06:19
Switch from openai() to openai.chat() to use the Chat Completions API
instead of the Responses API. The Responses API is stateful and generates
server-side IDs (fc_...) for function calls that are not persisted for
Zero Data Retention (ZDR) organizations, causing multi-step tool calls
to fail.

The Chat Completions API is stateless and works correctly with ZDR.
- OpenAI: gpt-5.2 → gpt-5-mini
- Anthropic: claude-sonnet-4-5 → claude-haiku-4-5
- Google: gemini-3-pro → gemini-3-flash-preview

Also adds Phase 10 test results documenting:
- ZDR compatibility fix (openai.chat vs openai)
- Model availability testing
- Multi-provider verification
- Add ./clients export path to package.json for programmatic API access
- Export createMCPServer, runMCPServer, MCPServerConfig from clients module
- Document Phase 11 programmatic API test results in test-results.md
Flip the default behavior: file tools (listFiles, readFile) are now
enabled by default. Use --search-only to disable them.

This is more intuitive - users get full functionality by default and
explicitly opt out when they only want the search tool.

- cmd-mcp: --search-only disables list_files/read_file tools
- cmd-agent: --search-only disables listFiles/readFile tools
- cmd-search: --search-only disables file access
…ansport

- Add mcp-http-server.ts with runMCPHttpServer() and createMCPHttpServer()
- Add mcp-serve CLI command with --port, --host, --cors, --base-path, --api-key options
- Support API key authentication via Authorization: Bearer header
- Support CORS for browser-based clients
- Update README with HTTP server documentation and examples
Updated tool descriptions for search, list_files, and read_file to be more
detailed and informative, adapting from Auggie CLI while keeping content
appropriate for context-connectors:

- Added multi-line descriptions with features and usage notes
- Included condensed regex syntax guide for searchPattern
- Clarified parameter semantics (1-based, inclusive, relative paths)
- Removed coding-specific language to support general use cases

Files modified:
- src/clients/mcp-server.ts
- src/clients/cli-agent.ts
…-specific store paths

- Rename -k, --key flag to -n, --name across all CLI commands
- Change default store location from CWD-relative .context-connectors to:
  - Linux: ~/.local/share/context-connectors (XDG Base Directory spec)
  - macOS: ~/Library/Application Support/context-connectors
  - Windows: %LOCALAPPDATA%\context-connectors
- Add CONTEXT_CONNECTORS_STORE_PATH environment variable override
- Priority order: --store-path CLI option > env var > platform default
- Update README.md with new flag, Data Storage section, and env var docs
Replace importFromFile with temp file pattern with direct import() call.
This is platform-neutral and avoids unnecessary filesystem operations.
Replace flat --source flag with subcommands for each source type:
- index filesystem (alias: fs)
- index github
- index gitlab
- index bitbucket
- index website

Each subcommand now shows only relevant options in --help.
Extracted shared store options into reusable helper functions.
- Make --name optional for mcp and mcp-serve commands (default: all indexes)
- Accept multiple index names with -n/--name <names...>
- Discover available indexes at startup from store
- Include available indexes in tool descriptions
- Add index_name parameter to all tools (search, list_files, read_file)
- Lazy initialization of SearchClient per index on first use
- Cache initialized clients for reuse
Show source type, identifier, and relative sync time for each index:

  NAME          SOURCE                     SYNCED
  augment-docs  github://augmentcode/docs  1d ago
  lm-plot       github://igor0/lm-plot     1d ago
- Change SourceMetadata to discriminated union with typed config per source
- Store original ref (branch/tag) separately from resolvedRef (SHA)
- Enable future re-indexing by preserving all source parameters
- Add getSourceIdentifier() and getResolvedRef() helper functions
- Maintain backward compatibility with legacy format indexes
- Update all sources, consumers, and tests
- sync <name> updates a single index using stored config
- sync --all updates all indexes
- Keeps 'index' command name (clearer than 'add')
- Add explicit path format examples with ✅/❌ to prevent /repo confusion
- Add example output for search results
- Use consistent 'repository root' terminology
- Add clearer parameter documentation with inline defaults
- Improve regex guidance with specific unsupported pattern examples
igor0 added 22 commits January 4, 2026 21:56
- Use loadState() instead of load() for tests that need full state
- Use loadSearch() for tests that need search-only state
- Change CLI option from -n, --name to -i, --index
- Named indexes now stored in {basePath}/indexes/{key}/ instead of {basePath}/{key}/
- File-based mode (no --index) still stores directly in basePath
- Update list() to look in indexes/ subdirectory
- Add -i/--index option as alias for -n/--name in both commands
- When no --index is provided, load directly from {store-path}/search.json
- Use '.' as key for file-based mode (files go directly in store-path)
- Update error messages to show correct location when index not found
- Fix filesystem.test.ts to use indexes/ subdirectory structure
- Move 'list' and 'delete' under 'index' command (index list, index delete)
- Restructure MCP commands: 'mcp' -> 'mcp local', 'mcp-serve' -> 'mcp remote'
- Remove obsolete 'sync' command (replaced by 'index github --index <n>')
- Consolidate cmd-list.ts, cmd-delete.ts, cmd-mcp-serve.ts into their parent commands
- Update main entry point to reflect new command structure
- Updated Indexer to call context.export() with 'full' and 'search-only' modes
- Updated IndexStore.save() to accept both fullState and searchState
- Updated FilesystemStore, MemoryStore, S3Store to save both state files
- SearchClient now uses loadSearch() for search-only state
- Removed manual blob stripping - SDK handles export modes natively
- Updated tests to use new dual-state pattern
- Add 'local' command with 'list' and 'delete' subcommands for managing local indexes
- Add s3-config.ts helper to read S3 config from CC_S3_* environment variables
…deduplication

- Add --save-content debug option to save crawled content for inspection
- Add progress reporting during upload and indexing phases
- Show clear summary of new vs unchanged files
- Reuse previous context state for client-side deduplication
- Cache crawl results to avoid redundant re-crawling
…rogress

Update to use new SDK progress API that provides separate uploaded and
indexed counters instead of a single stage-dependent processed counter.
- Add unified --index <specs...> option for mcp local, mcp remote, and agent
- Support index spec formats: name, path:/path, s3://bucket/key
- Add CompositeStoreReader for routing to different stores based on specs
- Add MultiIndexRunner for shared multi-index client management
- Extract tool descriptions to shared tool-descriptions.ts module
- Agent command now supports multiple indexes like MCP commands
- All tools include index_name parameter when multiple indexes specified
- Use -i, --index <spec> with same format as other commands
- Remove deprecated -n/--name and --store options
- Remove --path override (source path comes from index metadata)
Named indexes now always use the default path (~/.augment/context-connectors).
Users can use path:/custom/path for custom locations.
- Remove --search-only (not meaningful for search command)
- Add --raw flag for raw search results
- Default behavior uses searchAndAsk() to answer questions via LLM
- Add searchAndAsk() method to SearchClient
- Add runtime console warnings when binding to non-localhost interfaces
- Warn that HTTP traffic is unencrypted and API keys transmitted in cleartext
- Suggest production alternatives: TLS reverse proxy, VPN, SSH tunneling

- Add Security Considerations section to README with:
  - Caddy and nginx reverse proxy examples
  - SSH tunneling instructions
  - Network isolation guidance
  - API key generation recommendations
Align CLI naming with MCP SDK transport terminology:
- 'mcp local' is now 'mcp stdio'
- 'mcp remote' is now 'mcp http'

Updated all README examples to use new command names.
- Add header: 'Context Connectors Minimal Agent'
- Add two newlines after each agent turn for better readability
- Only show tool calls in verbose mode
- Add header: 'Context Connectors Minimal Agent'
- Add two newlines after each agent turn for better readability
- Only show tool calls in verbose mode
- Run interactively by default, non-interactively with --print flag
- Query is now an optional positional argument (works in both modes)
- In interactive mode with query: asks query first, then prompts
- In --print mode without query: exits with error
@igor0 igor0 marked this pull request as ready for review January 8, 2026 21:25
Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

* @param key - The index key/name
* @returns The stored IndexState (without blobs), or null if not found
*/
loadSearch(key: string): Promise<IndexState | null>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadSearch() is documented as returning the search-optimized state without blobs, but the type is Promise<IndexState | null>; this makes it easy to accidentally treat a search-only state as a full IndexState (and e.g. assume contextState.blobs exists). Using IndexStateSearchOnly here would better reflect the actual contract.

Other Locations
  • context-connectors/src/stores/filesystem.ts:81
  • context-connectors/src/stores/memory.ts:40
  • context-connectors/src/stores/s3.ts:141
  • context-connectors/src/clients/search-client.ts:20

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

]);
}

async delete(key: string): Promise<void> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete() removes only state.json; search.json is left behind, so list() can continue to return this key and search clients may read stale state. This can make deletions appear to “not work” in S3-backed setups.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

});

stream.pipe(parser);
parser.on("close", resolve);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In downloadTarball(), the promise only rejects on stream errors; if the tar parser emits an error (corrupt/partial archive), this can hang indefinitely waiting for close. Consider handling parser errors as well so failures surface reliably.

Other Locations
  • context-connectors/src/sources/gitlab.ts:253

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

removed.push(file.filename);
} else if (file.status === "added" || file.status === "modified" || file.status === "renamed") {
// Download file contents
const contents = await this.getFileContents(file.filename, currentRef);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchChanges() downloads and indexes changed files without applying the same .augmentignore/.gitignore + shouldFilterFile() checks used in full indexing, so incremental updates can index files that full indexing would intentionally skip (binary/oversized/ignored). This can lead to inconsistent indexes and occasional ingest errors.

Other Locations
  • context-connectors/src/sources/gitlab.ts:404
  • context-connectors/src/sources/bitbucket.ts:455

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

@igor0
Copy link
Contributor Author

igor0 commented Jan 9, 2026

augment review

Copy link

@augment-app-staging augment-app-staging bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 7 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

*/
private async readFileRawBuffer(path: string, ref: string): Promise<Buffer | null> {
try {
const url = `${this.baseUrl}/repositories/${this.workspace}/${this.repo}/src/${encodeURIComponent(ref)}/${encodeURIComponent(path)}`;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodeURIComponent(path) will encode / as %2F, which likely breaks Bitbucket’s /src/{commit}/{path} endpoint for nested paths (files in subdirectories may become unreadable).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.


try {
// Clone with depth 1 for efficiency, then checkout the specific ref
execSync(`git clone --depth 1 --branch ${ref} "${cloneUrl}" "${tempDir}"`, {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref here is a commit SHA (from resolveRefToSha()), but git clone --branch expects a branch/tag name; this clone step is likely to fail, and the follow-up git fetch origin ${ref} may also be rejected by the remote.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

});

// Download tarball
const response = await fetch(url);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fetch(url) doesn’t include GitHub auth headers; for private repos the tarball endpoint/redirect is likely to 404/403, causing full indexing to fail.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

const parseBody = (req: IncomingMessage): Promise<unknown> => {
return new Promise((resolve, reject) => {
let body = "";
req.on("data", (chunk) => (body += chunk));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseBody buffers the entire request body with no size limit; if this server is exposed, a large POST could cause memory exhaustion/DoS.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

if (output.length <= maxLength) {
return { text: output, truncated: false };
}
const truncateAt = maxLength - TRUNCATION_MESSAGE.length;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If maxOutputLength is smaller than TRUNCATION_MESSAGE.length, truncateAt becomes negative and slice(0, truncateAt) can produce unexpected output (not a clean truncation).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

contextAfter: number
): { lineNumbers: Set<number>; matchingLines: Set<number> } {
const flags = caseSensitive ? "g" : "gi";
const regex = new RegExp(pattern, flags);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new RegExp(pattern, flags) isn’t guarded; an invalid user-supplied regex pattern will throw and crash the tool call instead of returning a structured error.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

// Load metadata for available indexes
const indexes: IndexInfo[] = [];
for (const name of indexNames) {
const state = await store.loadSearch(name);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If store.list() includes an index name but loadSearch(name) is missing (e.g., partial state), indexNames still includes it while indexes metadata omits it, so the server may advertise an index that later errors when used.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants